expr: fix non-monotone annotations on timestamp/date/interval functions by antiguru · Pull Request #36702 · MaterializeInc/materialize

antiguru · 2026-05-24T02:07:38Z

Motivation

Fixes database-issues#9656 — the persist filter pushdown correctness violation! audit panic that has been reproducing under parallel-workload --scenario regression --complexity ddl.

Description

Interval is lex-ordered by (months, days, micros) (derived Ord), but several scalar functions that consume an interval do calendar-aware arithmetic (variable-length months, day-clamping, bin alignment to the unix epoch) which does not respect that ordering. The interpreter behind persist filter pushdown was marking these as monotone, so it mapped only the endpoints of an interval range — and missed interior values whose function results fall outside the endpoint-bounded box. Filter pushdown then incorrectly concluded the part had no matching rows, tripping the audit panic in persist_source.rs:625.

Bug 1: add/sub_{timestamp,timestamp_tz,date}_interval

For t = 2024-01-31:

interval	lex order	`t + i`
`{0 months, 31 days}`	smaller	`2024-03-02`
`{1 month, 0 days}`	larger	`2024-02-29`

So the smaller interval produces the larger output. For timestamps, day-clamping also collapses near-boundary inputs into the same date while preserving sub-day time, breaking monotonicity in the first argument too:

2024-01-30 23:59:59 + 1 month  →  2024-02-29 23:59:59
2024-01-31 00:00:00 + 1 month  →  2024-02-29 00:00:00

For dates the first argument is monotone (no sub-day precision means clamping only collapses, never reverses), but the interval argument has the same problem.

Demotions:

add_timestamp_interval, add_timestamp_tz_interval, sub_timestamp_interval, sub_timestamp_tz_interval → (false, false)
add_date_interval, sub_date_interval → (true, false)

Bug 2: date_bin_{timestamp,timestamp_tz}

date_bin(stride, source) = origin + floor((source - origin) / stride) * stride. For a fixed source like 2024-01-01 12:00:00, a 1-day stride bins to 2024-01-01 00:00:00 but a 2-day stride bins to 2023-12-31 00:00:00 — the lex-larger stride produces an earlier output, because alignment to the unix epoch depends on stride magnitude, not lex order. Demoted from (true, true) to (false, true) (still monotone in source).

Context

This is the companion fix to b607993 ("Mark some interval-related functions as non-monotone"), which corrected the analogous annotations for the *_time_interval and *_interval (interval × scalar) variants but left the calendar-aware variants untouched. Together with the cases already fixed in #9301, this completes the sweep of interval-consuming functions whose result doesn't respect interval lex order.

The second commit removes the persist_stats_filter_enabled: "false" override in parallel_workload/settings.py that was added to suppress this audit panic, so CI re-exercises the audit and surfaces any remaining latent causes.

On the "numeric trim" suspicion in the issue thread

I traced every is_monotone annotation on numeric functions (arithmetic, casts, rounding, etc.) and they all look correct. I think the audit failures previously attributed to numeric trimming were actually the timestamp/date+interval bug class showing up in mixed expressions — parallel-workload going green with only this fix supports that read.

Verification

New regression tests in interpret.rs:
- test_add_timestamp_interval_non_monotone — constructs (2024-01-31 + interval_col) >= 2024-03-15 with interval_col ranging over [{0m,31d}, {1m,0d}]. Under the buggy annotation the interpreter ruled out Datum::True (and so pushdown would have skipped the part); with the fix it correctly admits True.
- test_date_bin_timestamp_non_monotone — constructs date_bin(stride_col, 2024-01-01 12:00:00) >= 2024-01-01 with stride_col ranging over [1 day, 2 days]. Asserts the interpreter admits both True (1-day stride) and False (2-day stride).
cargo test -p mz-expr --lib — all 50 tests pass locally, including the proptest scalar::func::test::test_is_monotone.
Parallel-workload override removed; the user has confirmed the parallel-workload regression scenario runs green with this fix.

Tradeoff and follow-up

The non-monotone marking on sub_timestamp_interval and friends is conservative — even a day-only interval shares the annotation with month-bearing intervals because the static interpreter can't distinguish them. As a result, EXPLAIN ... WITH (filter pushdown) no longer reports a pushdown= line for predicates like timestamp_col - INTERVAL '1' day < literal (the slt test in this PR reflects that). That class of predicate was a motivating use case for filter pushdown, so the regression is addressed in #36706, which stacks on this PR and recovers the pushdown via a SpecialBinary dynamic-monotonicity check on the interval value.

`Interval` is lex-ordered by (months, days, micros), but adding an interval to a timestamp or date adds *calendar* months with day-clamping and then days as 24-hour periods. That arithmetic does not respect the lex order: t = 2024-01-31 i1 = {0 months, 31 days} → t + i1 = 2024-03-02 i2 = {1 month, 0 days} → t + i2 = 2024-02-29 In lex order `i1 < i2`, yet `t + i1 > t + i2`. For timestamps, the day-clamping also collapses near-boundary inputs into the same date while preserving sub-day time, so the first argument is non-monotone too: t1 = 2024-01-30 23:59:59, i = {1 month} → 2024-02-29 23:59:59 t2 = 2024-01-31 00:00:00, i = {1 month} → 2024-02-29 00:00:00 For dates the first argument *is* monotone (no sub-day precision means clamping only collapses, never reverses), but the interval argument has the same problem. These annotations are consumed by the abstract interpreter that drives persist filter pushdown. Marking these functions monotone meant the interpreter computed the output range by evaluating the function only at the endpoints of the input interval range — and a stats range like `[{0m,31d}, {1m,0d}]` would yield the narrow output range `[2024-02-29, 2024-03-02]` even though interior intervals (e.g. `{0m, 60d}`) actually produce timestamps far outside that window. Filter pushdown could then incorrectly conclude a part had no matching rows, tripping the `persist filter pushdown correctness violation!` audit in `persist_source.rs`. Companion fix to b607993, which corrected the analogous annotations for `add_time_interval`, `sub_time_interval`, `mul_interval`, and `div_interval`. Fixes database-issues#9656.

The override was added to suppress the audit panic from database-issues#9656; with the monotonicity annotations on timestamp/date + interval corrected, filter pushdown should be sound again. Removing this lets CI re-exercise the audit and surface any remaining latent causes.

- Drop useless .into() in the new interpret.rs regression test (clippy). - Update filter-pushdown.slt: sub_timestamp_interval is no longer pushdownable now that it's marked non-monotone, so the expected pushdown= lines for 'timestamp - INTERVAL day' queries are gone. This is the known tradeoff for soundness — even day-only intervals share the non-monotonic annotation since the abstract interpreter can't distinguish them statically.

def-

Amazing! I'll rebase my workload-replay change on top of this and verify: #36686

Also triggered a nightly run: https://buildkite.com/materialize/nightly/builds/16533

def- · 2026-05-24T04:21:35Z

+    /// the unix epoch depends on the stride magnitude rather than on lex order.
+    #[mz_ore::test]
+    #[cfg_attr(miri, ignore)]
+    fn test_date_bin_timestamp_non_monotone() {


Does it actually fail without the PR? The QA LLM review claims not, but I didn't verify:

MEDIUM — test_date_bin_timestamp_non_monotone is a tautology and does not actually regression-test the fix

File: src/expr/src/interpret.rs:1844-1903

The test sets up date_bin(stride_col, 2024-01-01 12:00:00) >= 2024-01-01 00:00:00 with stride_col ∈ [{0,1,0}, {0,2,0}] and asserts that the interpreter admits both True and False. The endpoint evaluations are:

date_bin({0,1,0}, 2024-01-01 12:00) = 2024-01-01 00:00 → >= is True.

date_bin({0,2,0}, 2024-01-01 12:00) = 2023-12-31 00:00 → >= is False.

Because the endpoints already produce opposite boolean answers under the (buggy) (true, true) annotation, the interpreter's union of {False} and {True} already admits both outcomes. The test therefore passes regardless of whether date_bin_timestamp is marked (true, true) or (false, true).

Verified empirically: I reverted src/expr/src/scalar/func.rs:2057 to is_monotone = "(true, true)" while leaving the rest of the PR in place, and ran cargo test --lib -p mz-expr test_date_bin_timestamp_non_monotone — the test still passes.

In contrast, the companion test test_add_timestamp_interval_non_monotone is genuinely diagnostic — reverting the add_timestamp_interval annotation makes it fail with interpreter incorrectly ruled out matching rows.

The in-test comment is also self-contradictory: it asserts "both endpoint evaluations give the same boolean answer" while the lines just above explicitly list one endpoint satisfying >= and the other not.

Impact: The fix to date_bin_timestamp / date_bin_timestamp_tz is correct, but the test that's supposed to lock it in provides false assurance. A later refactor that re-promotes the annotation would not be caught by cargo test -p mz-expr --lib (the PR description points to this command as the verification gate). Given that filter-pushdown correctness bugs are P1/test-blocker (see #9656), losing the regression coverage is a real durability concern.

Suggested fix: Compare against a timestamp strictly between the two endpoint outputs so that the buggy lex-mapping rules out the matching-rows case. For example:

// Endpoint outputs are 2023-12-31 00:00 and 2024-01-01 00:00. An interior // stride of `{0, 1 day, 12h-worth-of-micros}` bins source to 2024-01-01 12:00, // which is outside the lex-endpoint box. Comparing against 2024-01-01 12:00:00 // forces the interpreter to (wrongly, under the buggy annotation) rule out // True. let expr = MirScalarExpr::column(0) .call_binary(ts_lit("2024-01-01T12:00:00"), DateBinTimestamp) .call_binary(ts_lit("2024-01-01T12:00:00"), Gte); // ... same column setup ... assert!( range_out.may_contain(Datum::True), "date_bin is not monotone in the stride argument; \ interpreter must not rule out matching rows", );

Under (true, true), the lex range [2023-12-31, 2024-01-01] is entirely < 2024-01-01 12:00:00, so the interpreter would only admit False and the may_contain(True) assertion would fail — making it a real regression test. Under the fix's (false, true), the stride flat-map yields anything(), both outcomes are reachable, and the assertion passes.

(Concretely, an interior stride such as {0, 1, 43200000000} — 1 day + 12 hours — does bin 2024-01-01 12:00:00 to itself, so the predicate is genuinely achievable. The test doesn't need to construct that value; it just needs to assert the interpreter doesn't rule out True.)

Great find! Fixed in the latest commit.

date_bin(stride, source) = origin + floor((source - origin) / stride) * stride. For a fixed source like 2024-01-01 12:00:00, a 1-day stride bins to 2024-01-01 00:00:00 but a 2-day stride bins to 2023-12-31 00:00:00 — i.e. the lex-larger stride produces an earlier output. Same class of bug as the timestamp/date + interval monotonicity issues. Demotes date_bin_timestamp and date_bin_timestamp_tz from (true, true) to (false, true). Still monotone in source.

def- · 2026-05-24T13:04:44Z

With this PR I'm still seeing the correctness panic: https://buildkite.com/materialize/release-qualification/builds/1250 I did find another non-monotone function though: #36708 Trying now: https://buildkite.com/materialize/release-qualification/builds/1251 Edit: Still the same panic.

antiguru · 2026-05-25T01:58:06Z

Claude found another class of problem, which is error handling: #36721

def- · 2026-05-25T02:10:05Z

Nice, next try with all 3 PRs included: https://buildkite.com/materialize/release-qualification/builds/1252 Edit: Failed, but only because we run a benchmark against main, and of course it crashes on main! Now trying without main: https://buildkite.com/materialize/release-qualification/builds/1256 Looking much better so far

DAlperin

Lgtm thanks

antiguru · 2026-05-26T19:00:13Z

Thanks for the reviews!

See commit, based on top of #36702 Postgres-style `age(a, b)` is annotated `is_monotone = "(true, true)"`, but it is non-monotone in either argument. First argument: the carry logic in the calendar-aware difference borrows a whole month's worth of days when the day field goes negative, which breaks the lex order of `Interval` (months, days, micros) at month boundaries: age(2024-03-31, 2024-02-15) = {1 month, 16 days} age(2024-04-01, 2024-02-15) = {1 month, 15 days} ← lex-smaller age(2024-05-01, 2024-02-15) = {2 months, 15 days} Second argument: after the algorithm's final sign-revert, the same day-borrow makes the result dip non-monotonically as `b` crosses a month boundary: age(2024-02-29, 2024-03-30) = {-1 month, -1 day} age(2024-02-29, 2024-03-31) = {-1 month, -2 days} ← lex-smaller age(2024-02-29, 2024-04-01) = {-1 month, -1 day} Demotes both `age_timestamp` and `age_timestamp_tz` to `(false, false)`. Adds regression tests in `interpret.rs` exercising the abstract interpreter via a `>=` predicate, mirroring the existing tests for the analogous `add_timestamp_interval` and `date_bin_timestamp` bugs. Same bug class as 5257b0d and 51e86ed.

…36721) ### Motivation Even with the monotonicity fixes in #36702 and #36708 (both merged), workload-replay was still hitting `persist filter pushdown correctness violation!`. The audit log on that failure (database-issues#9656, `u103`) showed: ``` mfp = MapFilterProject { expressions: [CastStringToUuid(Column(3, "merchant_group_id"))], predicates: [(2, Not(IsNull(Column(1, "checksum"))))], projection: [0, 1, 2, 4, 5, 6], input_arity: 6 } upper_bounds: [cast_timestamp_tz_to_mz_timestamp(Coalesce(deleted_at, '9999-12-31'))] stats: checksum lower="bar" upper="bar"; deleted_at = 2025-06-16 (single value) result: Err((DataflowErrorSer(143), 1779629555000, +1)) ``` The interpreter computed: - predicate `NOT IsNull(checksum)` → `{True}` (stats say non-null). - upper-bound check `2025-06-16 >= mz_now` → `{False}` (frontier was past 2025-06-16). - `AND({True}, {False}) = {False}`, `fallible = false`. → `may_keep=false, may_error=false, may_skip=true` → **Discard the part**. But the actual MFP runtime ran the predicate (True), then evaluated the `cast_string_to_uuid("merchant_group_id_a")` expression (NOT a valid UUID), which errored, and the whole row was emitted as `Err`. The discarded part actually had an error row to emit. ### Description This is a different bug class from the monotonicity fixes (#36702 / #36708); it stands on its own. The runtime evaluator (`SafeMfpPlan::evaluate_inner`) runs every MFP expression once all preceding predicates pass; any expression error propagates as the row's error result. The abstract interpreter's `mfp_filter` / `mfp_plan_filter`, however, only ANDs together the predicates and temporal bounds — so the AND result misses the fallibility of any expression whose result column isn't referenced by a predicate or bound. This PR overrides `mfp_filter` and `mfp_plan_filter` on `ColumnSpecs` to set the returned summary's `fallible` bit if any of the MFP expressions' specs are fallible. Conservative (we'll keep parts where a predicate could-but-doesn't-have-to fail, even though predicate short-circuiting would have prevented the expression from running) but sound and matches the runtime semantics. The default trait-level implementation (used by `Trace`) is unchanged, since `Trace` is about pushdownability rather than soundness. ### Verification - New regression test `interpret::tests::test_mfp_unreferenced_fallible_expression`: builds an MFP with one always-erroring expression (`cast_string_to_uuid("not-a-uuid")`) and one always-passing predicate, asserts that the interpreter's summary `may_fail()`. Fails on the pre-fix code; passes on the fix. - New proptest `interpret::tests::test_mfp_filter_fallibility_equivalence`: for a random expression placed unreferenced in `MapFilterProject::expressions`, verifies that whenever runtime evaluation on a row from the stats range errors, the interpreter's summary reports `may_fail()` — directly checking the soundness claim against the MFP impl. - `cargo test -p mz-expr --lib` — all tests pass. - Workload-replay should stop hitting the audit panic on this class of MFP (cast/parse expression columns that aren't gated by a predicate). ### Cost Any MFP with a fallible expression in its `expressions` list will now keep parts that would previously have been discarded by temporal-bound checks. The tighter version would only mark fallibility when the predicates' spec admits `True` (so the expression would actually run at runtime); that's a follow-up if the conservative version costs too much in practice. --------- Co-authored-by: Claude <noreply@anthropic.com>

claude added 3 commits May 24, 2026 02:02

antiguru marked this pull request as ready for review May 24, 2026 02:50

antiguru requested review from a team as code owners May 24, 2026 02:50

antiguru requested review from DAlperin and def- May 24, 2026 02:53

antiguru changed the title ~~expr: mark timestamp/date + interval as non-monotone~~ expr: fix non-monotone annotations on timestamp/date/interval functions May 24, 2026

antiguru mentioned this pull request May 24, 2026

expr: recover pushdown for day-only interval predicates #36706

Open

def- approved these changes May 24, 2026

View reviewed changes

def- reviewed May 24, 2026

View reviewed changes

antiguru force-pushed the claude/github-issue-9656-4DVQc branch from 51e86ed to fa54894 Compare May 24, 2026 11:18

antiguru force-pushed the claude/github-issue-9656-4DVQc branch from fa54894 to eb53ce5 Compare May 24, 2026 12:15

def- mentioned this pull request May 24, 2026

expr: age is non-monotone in both arguments #36708

Merged

antiguru mentioned this pull request May 25, 2026

expr: propagate MFP expression fallibility through filter pushdown #36721

Merged

DAlperin approved these changes May 26, 2026

View reviewed changes

antiguru merged commit 5a511a1 into main May 26, 2026
118 checks passed

antiguru deleted the claude/github-issue-9656-4DVQc branch May 26, 2026 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expr: fix non-monotone annotations on timestamp/date/interval functions#36702

expr: fix non-monotone annotations on timestamp/date/interval functions#36702
antiguru merged 4 commits into
mainfrom
claude/github-issue-9656-4DVQc

antiguru commented May 24, 2026 •

edited

Loading

Uh oh!

def- left a comment •

edited

Loading

Uh oh!

def- May 24, 2026

Uh oh!

antiguru May 24, 2026

Uh oh!

def- commented May 24, 2026 •

edited

Loading

Uh oh!

antiguru commented May 25, 2026

Uh oh!

def- commented May 25, 2026 •

edited

Loading

Uh oh!

DAlperin left a comment

Uh oh!

Uh oh!

antiguru commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

antiguru commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Verification

Tradeoff and follow-up

Uh oh!

def- left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

def- May 24, 2026

Choose a reason for hiding this comment

MEDIUM — test_date_bin_timestamp_non_monotone is a tautology and does not actually regression-test the fix

Uh oh!

antiguru May 24, 2026

Choose a reason for hiding this comment

Uh oh!

def- commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antiguru commented May 25, 2026

Uh oh!

def- commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DAlperin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antiguru commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

antiguru commented May 24, 2026 •

edited

Loading

def- left a comment •

edited

Loading

MEDIUM — `test_date_bin_timestamp_non_monotone` is a tautology and does not actually regression-test the fix

def- commented May 24, 2026 •

edited

Loading

def- commented May 25, 2026 •

edited

Loading